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Abstract 

The time-dependent angular distributions of decays of neutral B 
mesons into two vector mesons contain information about the life- 
times, mass differences, strong and weak phases, form factors, and CP 
violating quantities. A statistical analysis of the information content 
is performed by giving the "information" a quantitative meaning. It 
is shown that for some parameters of interest, the information con- 
tent in time and angular measurements combined may be orders of 
magnitude more than the information from time measurements alone 
and hence the angular measurements are highly recommended. The 
method of angular moments is compared with the (maximum) like- 
lihood method to find that it works almost as well in the region of 
interest for the one-angle distribution. For the complete three-angle 
distribution, an estimate of possible statistical errors expected on the 
observables of interest is obtained. It indicates that the three-angle 
distribution, unraveled by the method of angular moments, would be 
able to nail down many quantities of interest and will help in pointing 
unambiguously to new physics. 
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1 Introduction 



Among the available methods for studying CP violation, the decay modes 
of B mesons into two vector mesons, both of which decay into two particles 
each, are very promising mainly because of the larger number of observ- 
ables at one's disposal through the angular distributions of the decays [0. A 
disadvantage of having a large number of observables may be the difficulty 
in separating them from one another because of the correlations between 
them. The method of angular moments [0, |3| helps in extracting the observ- 
ables from the angular distributions by using judiciously chosen weighting 
functions. From the time evolutions of these observables, it is then possible 
to extract the information about the lifetimes, mass differences, strong and 
weak phases, form factors, and CP violating quantities. 

Here we will concentrate on the decays of the type B — > V\(— > X{V\)V2{— > 
X2 Y2), where B is a neutral B meson, V\ and V2 are vector mesons and 
X\,X2, Yi, Y2 are the four final state particles. We shall illustrate the tech- 
nique by using the particular decay B s — > J/i/;(—> £ + fr)(/)(—> K + K~). The 
other decay modes of the form B — > VV might have different angular distri- 
butions, and the method of angular moments will need corresponding differ- 
ent weighting functions (which can always be found [[|), but the observables 
in all these decay modes are the same. In addition, B s — ► J / ipcj) decay holds 
the promise of being able to measure the lifetimes of Bf and separately, 
and, if this lifetime difference is sizeable (as estimated in [Q), the prospect of 
measuring CP- violating quantities even without tagging ||. By quantifying 
the information content in the data we can judge the relative importance of 
the measurement of various possible quantities. The approach we have used 
to analyze the information content may be used in modes of decay other than 
the one we have considered here. 

Generally speaking, in any experiment the amount of information ob- 
tained depends on 

• what quantities are recorded 

• what numerical summaries of the recorded data were used 

• the number of data points 

• the parameter values governing the outcome of the experiment 
Since only the first two are under the direct control of the experimen- 
talist, we will address those two issues in this paper. We argue that the 
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expected information per observation available in the B s — > J/ipip decay is 
substantially more when both time and angular information are used instead 
of using the time information alone. Moreover, we show that the method of 
angular moments, when used to summarize and estimate the parameters, is 
computationally easy to implement and efficient (in the statistical sense) in 
extracting information from the data. 

In Sec. ||], we give the angular distribution and the time evolutions of the 
observables for the decay B s — > J/iptp. The definition of information in the 
data about a parameter value that we will use is standard in the statisti- 
cal literature and will be described briefly in Appendix |A[ Sec. || outlines 
why the angular information may be useful and then follows up with an an- 
alytic investigation of the additional information in the transversity angle 
over and above the time information. In Sec. |], we discuss the efficiency 
of the method of angular moments by comparing it with the the maximum 
likelihood method in the case of the transversity angle distribution. In Sec. |5] 
we carry out a simulation study of the method of angular moments for ex- 
tracting the relevant parameters from the three angle distribution. Sec. |6] 
concludes. 



2 Angular Distributions and Time Evolutions 
of Observables 

The most general decay amplitude for B — > VV takes the form || [7| 

A(B g (t) -> VxV 3 ) = ^e%e%-A\\{t)e%.ef 2 /V2-iA ± (t)e* Vl xe* V2 -pvjV2 , 

(1) 

where x = py 1 ■ pv" 2 /( m Vi m y 2 ) an d Pv 2 is the unit vector along the direction 
of motion of Vj> in the rest frame of V\. Here the time dependences originate 
from B q — B q mixing. In our notation only a B q meson is present at t — 0. 

For angles, we will use the same conventions as in Ref. 0], i.e. moves 
in the x direction in the J/ip rest frame, the z axis is perpendicular to the 
decay plane of <fi — > K + K~ , and p y (K + ) > 0. The coordinates (9, tp) describe 
the decay direction of Z + in the J/ip rest frame and ip is the angle made by 
p(K + ) with the x axis in the <fi rest frame. With this convention, 

x = p , y = p*+-p*;p*-p*+) z = xxy, 
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sin 9 cos<^ = p^+-x, sin 9 sin ip = pi+ ■ y, cos6* = p£+-z . (2) 



Here boldface characters represent unit 3- vectors and everything is measured 
in the rest frame of J/ip. Also 

cosip = -p' K + -p'j/^, (3) 

where the primed quantities are unit vectors measured in the rest frame of 
0. 

With this convention, the three angle distribution is given by |]3], |7j 
d 3 T[B s {t) -> J/^(-> 1 + 1~)<P(-^ K+K- 
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2\A (t)\ 2 cos 2 ip ( 1 -sin 2 9 cos 2 y) 



dcos9 dip dcosip 32ir 
+ sin 2 ?p{\A\\(t)\ 2 ( 1 -sin 2 9 sin 2 tp) + 1 A± (t ) | 2 sin 2 0- Im ( Af (t) A x (t) ) sin 20 sin 

+ ^sin2^{ Re (A* (t)A\\ (f)) sin 2 9 sin 2</?+ Im sin 20 cos } 

v2 

(4) 

The time evolutions of the coefficients of the angular terms are given in 
Table [|. Here Tl and Th are the widths of the light and heavy B s mass 
eigenstates, and B^ respectively, and Am is the mass difference between 
them. T is the average of T L and T H . Here d~i = Arg(A*^(0)Aj_(0)) and 
5 2 = Arg(AQ(0)A^(0)) are the strong phases, and 5<p ~ 2A 2 ?7 is related to an 
angle of a (squashed) unitarity triangle ||, which is very small in the standard 
model 0.03). We will denote the values of A x {0) (where X e {0, ||,-L}) 
simply as Ax in the rest of the paper. 

The 'transversity angle' 9 separates the CP-even and CP-odd decays. If 
we integrate over the remaining two angles and include the time dependence 
explicitly, the angular distribution in Eq. (||) becomes 



oc(|A o | 2 +|A||| 2 )(l + cos 2 ^)e- r ^ + |A ± | 2 sin 2 0e- r ^ , (5) 



dcos9 dt 

or, in the form of a normalized probability distribution, 

p (u,t\(3,T H ,T L ) = ^T L (l + u 2 )e- r ^ + ^(l-P)T H (l~u 2 )e- rHt , (6) 
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where u = cos 9 and 
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The corresponding value of /3 in the — > J/ipK* mode is measured 
[§, (T0[] to be 0.93 ± 0.03, so with conservative estimates for the breaking of 
flavour SU(3) symmetry, the value of j3 is expected to lie between 0.8 and 
1.0. 



3 Information in the transversity angle dis- 
tribution 

By considering the case when the time (t) and transversity angle (9) mea- 
surements are available, we will argue, in this section, that the additional 
information in 9 is substantial and worth the extra effort put into the angu- 
lar measurements. In Section 3.1 we will explain what makes the estimation 
of I\ — Th hard and why gathering the transversity angle data in addition 
to time is attractive. In Section 3.2 we will analyze the information content 
analytically and determine the numerical magnitude of the information gain. 

3.1 Why collect angular information? 

If the objective is to estimate Yl — Yh, which is the difference between the 
reciprocal of mean lifetimes, one may ask: how is it possible that the angular 
data is useful? 

Let us first consider the estimation of the parameters when we have only 
time information available. In that case the distribution of the lifetime (t) is 
given by 

p(t\P, F H , T L ) = (3T L e- VLt + (1 - (3)T H e- VHt , (8) 

which is a mixture of the two lifetimes. With probability (3 we observe the 
lifetime of a particle with mean lifetime l/Tj, and with probability (1 — 0) 
we observe the lifetime of a particle with lifetime 1/Yh- The expectation of 
the observed (mixed) lifetime is (3/Yl + (1—/3)/Yh- Since 

(^ + ^) x (^ + ( i -« r -)= i -*-«(\ffi-^S) 2 • (9) 
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the derived parameter (3Yl + {1—0)Th (which we will later call 6\) is, to 
the first order (when the T's are close to each other), the reciprocal of the 
expected mean observed lifetime. Thus the estimation of (3Y L + (1— [3)Y H 
which is a "mean" parameter is not hard even if we cannot "guess" the 
decay type. 

If we knew what type of decay each time measurement was coming from, 
then we could estimate T L and T H separately by using the reciprocal of the 
sample mean lifetimes of the two kinds of decays. We can then construct 
an estimate for Tl — Th- Given enough observations of both types, we can 
get good estimates for the difference of the lifetimes. However, the identity 
of the decay type is not known in real data and statistical procedures have 
to, at least indirectly, guess it as well as possible from the available data. 
When the two component lifetimes are widely different, then the observed 
lifetime is a good clue as to the identity of the decay type. However, when 
the lifetimes are close to each other, the clues in the time signature alone are 
not decisive. 

If only the transversity angle, 8, is measured, then the density of the data 
u = cos(#) is given by 

p{u\(3) = \(3{l+u 2 ) + \{l-(3){l-n 2 ) • (10) 

The distribution of the angles is very dependent on the type of the decay; 
therefore, by observing the angle alone one can have a fair idea as to what 
kind of decay has been observed. This is why the angular information is 
useful, even though it is not directly about lifetimes. 

As an illustrative example, consider the case when the two lifetimes are 
equally likely, i.e. (3 = 0.5 (see Fig. [3]). If Th/Tl = 1 and only time measure- 
ments are available, then we have no way of "guessing" what kind of decay 
we have observed. This is reflected in the fact that the a priori probability 
(the probability before making the measurement) that the decay is of the 
first type is the same as the a posteriori probability (the probability after 
making the measurement) and equal to 0.5. On the other hand, if the decay 
widths are dramatically different, say Th/Tl = 20, then the time measure- 
ment provides a good clue: if the time observation is very large, then it is 
more likely that the decaying particle is the one with the smaller decay width 
and if the time measurement is very small, then it is more likely that the 
decaying particle has the larger decay width. 
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If only angular measurements are available, then when u = cos(8) is very 
small or very large, there is a higher chance that the particle measured was 
of the first type because it has the angular distribution of |(1 + u 2 ) which 
implies more probability for large values of \u\. Note that the power to 
discriminate between the two kinds of decay by observing the transversity 
angle is not affected by the ratio of the decay widths. When we have both 
time and angular information, we will be able to benefit from the information 
contained in both which will be at least as much as the information in the 
angle alone. 

The heuristic ideas above are graphically presented in Fig. |I| The x-axis 
was chosen to be the the percentile of the observed data (time or angle, as 
the case may be) so that we can plot the different scenarios on the same scale. 
Additionally the plot has the desirable property that all points along the x- 
axis occur with equal probability for all the four scenarios considered. (This 
is because the percentile of the observed data is just 100 times the probability 
integral transform^ of the data point.) Thus we can visually look at the four 
curves and compare how much they deviate away in either direction from 
the line y = 0.5 to get an idea as to how well the data predicts the kind of 
decay. 

The curve corresponding to Th/Tl = 1-2 when only time is measured, is 
closer to the line y = 0.5 than the curve corresponding to when only angular 
measurements are taken. This implies that when the decay widths are close 
(for example when the ratio is 1.2), the information in the angular data alone 
is greater than that in the time data alone. Only in extreme cases, such as 
when the ratio of the decay widths is 20, can we predict well on the basis of 
time alone. 

3.2 A theoretical investigation of information content 

In this subsection, we will consider the problem of extracting information 
from a mixture of two distributions with the density of observations of the 
form 

p(x|A 1; A 2 , p) = /3</i(a;|Ai) + (1 - (3)g 2 (x\X 2 ). (11) 

3 If X is a continuous random variable with density function f(x), then the function 
F( x ) = f- f(i)dt = P(X < x) defines the probability integral transform. Then the 
random variable F(X) is uniformly distributed over the interval [0, 1]. 
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The data with or without angular information has this form (x denotes the 
data from a single observation and may be a vector). According to the equa- 
tion above, the data comes from the distribution gi(x\\\) with probability (5 
and from the distribution g 2 (x\X 2 ) with probability (1—/3). This setup is more 
general than the one we have, but it enables us to analyze the phenomenon 
with greater clarity and it is also applicable to data-collection scenarios other 
than B -> VV. For the B -> VV case, A x = Y L and A 2 = T H . 
When only time information is available, 



x = t, 0i(a;|Ai) = Aiexp(-Ait), g 2 (x\X 2 ) = A 2 exp(-A 2 t)- 

When both time and angular information are available, 

x = (t,u), 
3 

yi(x|Ai) = -(1 + w 2 )Aiexp(-Ait), 



(12) 



92(x\Xi) 



■(1 -M 2 )A 2 exp(-A 2 t). 



(13) 
(14) 



When only time information is collected, the functions gi(-) and g 2 (-) are the 
same. When both time and angular information are recorded, they will be 
different. 

The expected information matrix (See Appendix A) for the parameters 
A = (Ai, A 2 , j3) can be found to be 



/(A) 



p 2 JA 2 dfi p(l-p)JABdfi 
(3(l-(3)JABdfi (l-(3) 2 JB 2 d/j 
pJACdfx (l-p)JBCdfi 



pJACdfi 
[l-P)fBCdn 
JC 2 dfi 



= J [v, v] dfj, 

where A = ^(x|Ai), B = g' 2 (x\\ 2 ), C = ^i(x|Ai) - g 2 {x\X 2 ), 



(15) 
(16) 




[•, •] denotes outer product and Jd/i denotes integration with respect to the 
measure dx/p(x\Xi, X 2 , j3). 
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If we are interested in the difference of the two A's then we may be 
interested in the following derived parameterization of the problem : 



(3X l + (1-/3)A 2 
Ai — A2 





In that case the information matrix for 6 will be 




1(9) 



w, w] dfj, 

J {/3A+(l-/3)B) 2 d^ 



(17) 



(/3(l-/3)) 2 / {A-Bfdn 



J ((Aa - \i)(0A + (1-/3)B) + C) 2 dn 



(18) 



where 

/ (3A+(1-(3)B 
w= (3(1 - (3)(A - B) 

V {X 2 -X 1 ){(3A+{1-(3)B) + C 

The entries marked with a * are important, they are omitted for the 
sake of brevity since the qualitative features of the information matrix are 
clear without them. Careful inspection of the entries in the above matrix in 
dl8|) reveals some qualitative features of the dependence of the information 
content in the data on the parameter values. 

The information on 62 = X\ — A2 is low when A ~ B or if (3(1— (3) ~ 0. 
If gi = g2 and Ai ~ A2, then A ~ B. This is what happens when we have 
only time data and the two T's are close to each other. If Ai ~ A2, but 
gi ^ g 2 , then this problem does not occur. In fact, if g\ and g 2 are very 
different functions then even if A x ~ A2, we can recover information about 
Xi — A2 from the data. When both time and angle data are collected, the 
component densities as in (|T3| ) and ( |T4] ) are well-separated and there is a lot 
more information on Th — I\ than what would have been with time data 
alone. If most of the observations are from one component of the mixture, 
the information on Ai — A2 is small, since (3(1 — (3) ~ 0. 

The estimation of (3Yl + (1— (3)Th, on the other hand, is not affected 
much by the separation of densities since it is a "mean" parameter, as shown 
in Sec. EO. 
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As mentioned in Appendix |A], the inverse of the expected information 
matrix also gives the approximate variance matrix of the maximum likelihood 
estimates in large samples. If 1(8) is the expected information matrix from 
a single sample, nl(8) is the expected information matrix based on a sample 
of size n. Hence the diagonal elements of the matrix 

m =-V(0)}-' (19) 
n n 

will give us the approximate variance, V(8i), of the maximum likelihood 
estimates, 8i (i = 1, 2, 3), in samples of size n when n is large. 

Let 8i(t) denote the maximum likelihood estimate of 9i given time data 
alone and let 9i(t,u) denote the estimate using both time and angular infor- 
mation. By calculating the inverses of the corresponding expected informa- 
tion matrices, one can calculate the ratios 

v(§ i (t,u)y 

for i — 1,2, 3. Figs. ||-f| show the plots of these ratios for various values of (3 
and T H /T L . 

Whereas most of the information about 8\ is indeed in the time measure- 
ments as expected, it can be seen that the variance of the parameters Ai — A2 
and (3 is orders of magnitude higher if we have only the time information 
than if we had both the time and angle information. The physical region 
of interest lies around 0.8 < (3 and 0.8 < T H /T L , where the disparity in 
the two values of variances is striking. The width of confidence intervals is 
proportional to the standard error which is the square root of the variance of 
the estimator. Since the variance of maximum likelihood estimates and the 
moment estimates are inversely proportional to the number of data points, 
the ratio of the number of data points needed to have confidence intervals of 
a given length, with time data alone instead of time and angle data, is equal 
to the levels of the contours in Figs. 0-|4]. Looking at the upper right hand 
corner of the plot in Fig. |3], we can see that with the inclusion of the angular 
information, the sample sizes required for estimation of Tl — Th to a desired 
level of accuracy will be smaller by a factor of at least 10 times or even more 
than 100 times compared to those required with the time information alone. 

The analysis of the data using the angular information available is, there- 
fore, highly recommended. 
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4 The method of angular moments for ex- 
tracting information 

Because of the optimality properties that the maximum likelihood estimates 
enjoy in large samples, the method of maximum likelihood is widely used. 
The likelihood function, indeed, contains all the information available in the 
data, and is the most efficient method for summarizing the information in 
the data when the form of the probabilistic model for the data is known 



nfl . However, there are some practical limitations to the likelihood method. 
When the number of parameters is large, exploring the likelihood surface is 
problematic. Finding the maximum is also difficult. In addition, if proper 
care is not taken, misleading results may be obtained (see for example, [|TJ]. 
chapter on "Non-Linear Statistical Methods"), and when there are random 
errors in the measurement process, the likelihood function may not be com- 
putable (see section 4.2). 

We, therefore, propose the method of angular moments, which sacrifices 
on some information (as compared to the likelihood function), but can give 
consistent and reliable estimates of the parameters in a clear way. 

In the following section, we will show that in the case of one angle dis- 
tribution at least, the angular moments method is almost as efficient as the 
maximum likelihood method. In sections 4.2 and 4.3 we will discuss the 
effects of imperfections in the measurement process on both the method of 
angular moments and the likelihood method. 



4.1 The efficiency of the method of angular moments 

The method of angular moments is described in |3|]. It involves finding a set of 
weighting functions Wi(u) such that, given an angular distribution J2i hfiilL), 
where u is the vector of angular variables, 

E{ Wi {u)) = b h (20) 

where E stands for the expectation value. Such a set of weighting functions 
always exists || if the angular distribution is of the form mentioned above. 
An estimate of 6; is then 

1 n 

&i = -X>fei) • (21) 
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The estimate is unbiased, i.e. E(bi) = hi and its standard error is equal to 
Cj / y/n, where 

of = J {wi{u) - h) 2 f(u)du 

= (J(w t (u)) 2 f(u)du) - (/ Wi {u) f{u)du) . (22) 

The information in a sample of size n, on 6j can be measured by the inverse 
of the variance of b{ and is equal to n/af. The information per observation 
is then l/of. To compare the angular moments method with the likelihood 
method we will compare the information per observation with that of the 
likelihood method, as defined in the previous section. 

Let us take the example of the transversity angle distribution without 
any time information. The density is given by (|T0|). We shall see that in 
this case, the method of angular moments performs almost as well as the 
maximum likelihood estimate in the region of interest to us. 

The information content in the maximal likelihood method is as given in 
Eq. (|35p. For the method of angular moments, the density of u = cos(#) is 

p( u ) = (3/8)[/3(3n 2 - 1) + 2(1 - ^ 2 )] . 

The weighting function for (3 may be chosen to be w(u) = 5u 2 — 1, so that 
E{w) = (3 and E{w 2 ) = (24/7)/3 + (8/7)(l - f3). The ratio 

I(I3\u)am/I(/3\u)ml 

(where AM represents the method of angular moments and ML represents 
the maximal likelihood method) is plotted in Figure ||. 

The plot shows that for (3 > 0.3, the ratio of variances is more than 0.9. 
The expected value of (3 (0.8 < (3 < 1.0) is well within this range. Thus, 
in the physical region of interest, the method of angular moments seems to 
perform almost as well as the maximal likelihood fit. 

When we move to a higher number of parameters, the maximum like- 
lihood method will try to maximize the multidimensional likelihood func- 
tion and the complexity of the method increases rapidly with the number 
of dimensions. The dimension of the parameter space does not affect the 
implementation of angular moment method. Therefore it is useful, at the 
least, as a method for providing good initial estimates. Additionally, if it 
is as efficient compared to the full likelihood method as the one-angle case 
suggests, it could render the maximum likelihood method unnecessary. 
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4.2 The effect of measurement discretization 



So long we have assumed that the data are measured to the maximum pre- 
cision available. In practice, that is not the case and the data are usually 
reported as the midpoint of the interval in which the measurement actually 
fell. For example, if we are measuring a random variable X to a precision h, 
that means that all observations falling in the interval (x* — h/2,x* + h/2] 
are reported as x*. The resulting discretization of measurements can lead 
to a systematic bias in measurements, because instead of recording the ran- 
dom variable X, we are recording the derived random variable X* with the 
probability distribution given by 

P(X* = x*) = / f(x)dx. 

Jx*-h/2 

Let us compare the difference the means of the true and derived random 
variable, i.e., E(X) and E(X*). It suffices to compare terms of the form 

rx*+h/2 rx*+h/2 

/ xf(x)dx and x* / f{x)dx. 

Jx*-h/2 Jx*-h/2 

Now, 

r-x*+h/2 r x*+h/2 

L hl , tmdt - x ' I-*,. mdt 

x*+h/2 

(t - x*)f(t)dt 

*-h/2 
rx*+h/2 

/ (t - *•)(/(*•) + (t- x*)f'(x*) + 0((t - x*f)dt 

Jx*-h/2 

-+h/2 A3 



(tf(x*)+t 2 f\x*))dt = f\x*)^. 

-h/2 O 



Thus, the error in discretization is of the order of the cube of the length of 
the interval length of discretization if the density is "well-behaved". If the 
(effective) support of the distribution of X is [a, b], and the precision is h, 
then using a crude bound we would get 

\E{X)-E{X*)\< i t-^ f (sup \f{x)\] = (b-a£( sup \f(x)\) 

n> \xe[a,b] J O \xG[a,b] J 

(23) 
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The bound has obvious modifications when the discretization is done over 
intervals of varying length. 

Thus the bias due to discretization is of the order of the square of the bin 
widths. If the bin widths are sufficiently small, neither the moment method 
nor the likelihood method will be affected significantly. 

4.3 The effect of random error in measurements 

Another source of error in measurements comes from errors in the measuring 
instruments. Suppose the true variable we want to measure is X, but instead, 
due to random error we measure 

Y = X + E, 

where the error distribution has density /e(-) and is independent of X. It is 
reasonable to assume that the random error has mean 0. Suppose its variance 
is a\. Then, EiY) = E(X), but V(Y) = V(X) + a\. In other words, 
the mean of our measurements is unchanged by the random error, but the 
variance is increased. The implication is that the method of angular moments 
is unaffected by random error as far as the estimation goes. However, the 
standard errors of the estimates will be increased. 

The effect of random error on the likelihood method is more serious, be- 
cause it relies on the exact mathematical form of the density of the observed 
measurements. For example, when we are measuring time and the transver- 
sity angle, the density of the data will no longer be (g) but 

p(u,t\P,T H ,T L )= J J (^pT L (l + ul)e- r ^ + ^(l-P)T H (l~ul)e- r ^x 

x Ie(u - u*,t - U)du*dt* (24) 

In general, the likelihood function in the presence of random noise is in the 
form of an integral with respect to the noise terms which may be analytically 
intractable unless the distribution of the noise term is known and is in a 
simple form. 

If the noise term is believed to be significant, then it may be better to 
use the method of moments because it is robust to the presence of additional 
random noise. 
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5 Three-angle distribution 



Given the enormous additional amount of information available in the angu- 
lar data 9 as compared to the time data alone, we expect that the information 
embedded in the measurements of the two additional physical angles ip and 
tp would be useful in reducing the uncertainty on the parameters which can 
in principle be measured by the time and transversity angle data. More- 
over, the additional terms available for measurement in the three angle case 
[See Table |I[ allow us the access to additional parameters. The quantities 
A\\/A , Aj_/A (both magnitudes and phases), Am and the CP- violating 
parameter 5(f) need the measurement of these two extra angles. The CP 
asymmetry 

(e- THt - e' TLt ) cos(5i)(50 (25) 

can be measured even without tagging (without knowing whether the initial 
particle was a B s or B s ) as long as we have this information. Using all 
the angular data, therefore, is highly recommended. Here, we perform some 
monte-carlo simulations to estimate how well the above parameters will be 
known in the next few years. 

At the end of CDF run II (expected integrated luminosity ~ 2 fb^ 1 ), 
we should have around 9000 fully reconstructed B s — > J/ip(-^> l + l~)(f)(-^ 
K + K~) events |TB[, whereas this number is expected to increase by a factor of 
at least 15 (just due to the integrated luminosity improvement) with TeV33. 
Sets of 10,000 and 100,000 events were generated, with the accuracy in the 
measurements of time and angles taken at At = 0.1/r^ and A6 = 0.005. The 
method of angular moments and time moments [] was used to recalculate all 
the input parameters (with only the data, without any external information) 
and histograms were plotted for the recalculated parameters. The simulations 
use the following set of parameters : 

E£ = 0.8, 1^1=0.55, |4^|=0.45, S = 10.0, Si = 0.5, 5 2 = 2.5. 

This choice of parameters is consistent with the corresponding ones re- 



ported in [T(J for B — > J/ipK* and flavor SU(3) (except for the lifetime 



4 The n th time moment of a quantity Q(t) is denned as = / °° dt t n Q{t) . Zeroth 
time moment is just the time integrated quantity. 
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difference). It is seen that varying these parameters does not change the 
essential conclusions. 

Figures |6|-|9] show the results of these simulations. The Y-axis has been 
normalized to get the 'relative frequency density', such that the area under 
each histogram is equal. The following observations should be noted. 

• As can be seen from Fig. |6], the values of Th and Tl are well-separated 
in the first stage (10,000 events) itself. With 100,000 events, the differ- 
ence — Th can be determined to nearly ±0.05r^ to more than 95% 
confidence level. By virtue of the Central Limit Theorem, the method 
of moment estimates are approximately Gaussian in large samples. The 
visual appearance of the histograms is consistent with this theoretical 
property. The width of the Gaussian distribution T L — T H is then (ap- 
proximately) inversely proportional to — j3) [See Sec. [| and has a 
weak dependence on the actual value of Tl — Th as long as Tl — Th is 
small, which is the case here. So the above quantitative inferences from 
this histogram should stay valid even with a smaller value of Tl — Th- 
Determination of 1 — Th/Tl to 0.05 is thus within reach. Even the 



small lifetime difference predicted recently in |L4| may be probed with 
this. 

The accuracy in the measurements of |A||/A |, |v4_|_/v4 | is as indicated 
in figures |7| and |^ respectively. The predictions of form factor models 



15 1 can thus be directly tested here. 



The signs of cos(5i) and cos(<52) are important in order to resolve a 
discrete ambiguity in the CKM angle (3, as pointed out recently [[RJ]. 
In fact, if 8cj) is small (~ 0.03) as predicted by the standard model, these 
signs may be obtained without any time measurements as follows. With 
5(j) neglected, the time integrated angular moments of the "Im" terms 
in Table [I] give 

— cos(<5j + k) T/ sin k , (26) 

where k = tan _1 (r/Am) and X e {0, ||}. Since sinn is positive, the 
sign of these moments immediately give the sign of cos(5j + k), and 
given an upper limit (of ~ 0.1) on the value of k, will give the sign of 
cos(5j) as long as the value of this moment fl2"6|) is not close to zero. 
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Thus, just the sign of the angular moments of the "Im" terms in Table [I] 
would be sufficient to resolve a discrete ambiguity in (3. The relevant 
angular moments (time integrated) are shown in Fig. [5]. The widths of 
these moment histograms depend only weakly on the actual parameter 
values and the plot can be used as a guide to estimate the errors on 
the values of these moments for any other parameter values. 

• When T H « T L , 

/ ( c -rirt _ e -r Lt) w p L _ Th)/v -> (27) 
Jo 

The ability to measure — Tjj, combined with the measurement of 
the time-integrated CP asymmetry in Eq. (p5|) (even without tagging) 
would give a lower bound on 5<p. A high value of 5(f> would be a clear 
signal of physics beyond the standard model. In the next generation of 
experiments (TeV33 or LHC), accurate values of Si (i = 1,2) will be 
obtained and 5(j) can be pinpointed. 

Feasibility studies for the measurement of Am/T and the asymmetries in 
this decay mode using the angular moments method and weighting functions 
have been made in [[lTj] for the CMS detector, which claim that with L « 
10 fb^ 1 , reasonable sensitivity on the oscillations will be obtained at Am/T < 
40. The angular moments method has also been used for the analysis of 
B° — > D*~p + and B + — > D*°p + (JTB| and the error estimation (Tables III 
and IV) indicates that the angular moments method is almost as efficient as 
the best fit method in estimating the observables, even with the three angle 
distribution. 

6 Summary and Conclusions 

Using a 'reasonable' method for quantifying the information in the data, we 
have shown that the information content in the data may increase by orders of 
magnitude in the region of interest if angular information is added to the time 
information. This is true even if the quantity to be measured, e.g. the lifetime 
difference between and Bf , has no direct angular dependence. We have 
also isolated 'averaged' quantities for which this increase of information is 
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small, which means that their measurements are not helped much by the 
angular data. 

The actual use of the angular data involves the choice of a statistical 
method to summarize the data. The standard maximum likelihood method 
is theoretically the "best" when the number of data points is very large. 
However, when the number of parameters to be estimated is large, the nu- 
merical maximization of the likelihood may be difficult, and if proper care 
is not taken, misleading results may be obtained. Also, if there are random 
errors in the measurement process, then the likelihood function would be an 
integral that may not be mathematically known or, if known, not evaluable 
in a closed form. 

The method of angular moments is very straightforward to implement, 
and the connections to the parameters to be determined are more transpar- 
ent. It is consistent in the statistical sense that, with infinite data, it will nail 
the parameters down. Unlike the maximum likelihood method, it is robust 
under random errors of measurement. In the one angle case at least, as we 
have shown explicitly, it is almost as efficient as the maximum likelihood 
method in the region of interest. Both methods are subject to discretization 
errors which will be small if the interval of discretization is small. We there- 
fore recommend the use of the method of angular moments for extracting 
information, at least for the initial estimates. If necessary, they can be re- 
fined with the likelihood method. Even if the maximum likelihood method 
is used, optimization routines require consistent starting values which can be 
provided by the method of angular moments. 

We have used the angular moments method on simulated sets of data to 
estimate the accuracy to which it may determine the quantities of interest. 
In the case of the decay B s — > J/i]){— > £ + £~)</>(— > K + K~), we find that in 
the first stage of experiments (CDF II), this method should be sufficient to 
give reliable values of T L — T H , \A\\/A \ and |Aj_/A |. This, combined with 
the untagged CP asymmetry measured through the same decay, would give 
a lower bound for 5<f>, which is expected to be very small in the standard 
model. The signs of cos(<5i) and cos^), which are useful in resolving a 
discrete ambiguity in the CKM angle f3, can be determined in the next stage 
(TeV33), along with more accurate determination of S(f>, which may point 
unambiguously to new physics. 
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Appendix 

A Quantifying information in an experiment 

The notion of "information" that we have used in this paper is derived from 



statistical theory. A good reference is Suppose an experiment is per- 
formed to determine the value of the parameter a. The (average) information 
in the experiment to discriminate between different possible values of the pa- 
rameter when the true value of the parameter is ao, is measured by 

I(ao,X) = — J £(ao)p(x\ao) dx , (28) 

often called the expected Fisher information. Here X is the random variable 
denoting the data used from the experiment, p(x\a) is the probability of 
X given the parameter value a, and £(a) = \og(p(x\a)) is the log likelihood 
function. Note the dependence of the expected information in the experiment 
on the true value of the parameter, cto and on the data used, X. Both a 
and X may be vector-valued. This measure of information possesses the 
additivity property, i.e. if X 1 and X 2 denote data from two independent 
experiments about the same parameters, then 

i(a , (x 1? x 2 )) = ik x 1 ) + iK, x 2 ) . 

In particular this implies that if n independent and identically distributed 
data points, X\, X 2 , . . . , X n , are collected from an experiment, the expected 
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information in the whole experiment is n times the expected information in 
one observation. 



I(a , (X 1 ,X 2 , . . .,X n )) = nI(a ,Xi). 

Additionally, it can be shown that for any estimator of a, say a n , based on 
a sample Xi, X 2 , ■ ■ ■ , X n , (the Cramer- Rao inequality) 

I Ti 

V(a n )> — = : ^_^ , (29) 

I(Qt , (Ai,A 2 , . . .,A n jj l(a ,AiJ 

where V is the variance and a n is based on a sample of size n. When the 
sample size is large and certain regularity conditions hold, the lower bound 
in the variance is achieved by the maximum likelihood estimate. It is in this 
sense that the maximum likelihood estimate is the "best" . 

It can also be shown that when we have independent and identically 
distributed data points, for large samples, 

£{a) ~ e(a) + (a - a)i{a) + (a - af'l{a) (30) 

where a denotes the maximum likelihood estimate of a. By construction, 
£{a) = and hence 

t{a) ~£(a) + (a- a) 2 'l{a). (31) 

In other words, the log likelihood surface is approximately quadratic and its 
shape can be described by the position of the maximum (a) and the curvature 
of the log likelihood in the neighbourhood of the maximum (£(a)). The latter 
describes how fast the log likelihood falls off; the larger the value of £(a) , the 
steeper the fall and stronger is the evidence in favour of values near the 
maximum. For this reason, £(a), is also used as a measure of information, 
but since it varies from sample to sample, it is called the observed Fisher 
information. Its average value is the expected Fisher information mentioned 
above. 

While we have defined the information for a scalar parameter, the general 
idea can be extended to vector-valued parameters. When there are two or 
more parameters, the appropriate measure of expected information is the 
expected information matrix which is the expected value of the hessian of 



the log likelihood as in (|28|) . See [11| for details and additional references. 
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Table Captions 

Table 1. : Time evolution of the decay B s 
K + K ) of an initially (i.e. at t — 0) pure B s meson. 
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Figure Captions 

Fig. 1. : The ability to guess the decay type in four scenarios: Plots of 
the posterior probability that a decay is of the first type (with mean lifetime 
l/r L ) given 

• only angular information, u = cos(8), (solid line) 

• only time data, Yh/^l = 1, (dotted line) 

• only time data, Th/Tl = 1.2, (narrowest dashed line) 

• only time data, Th = 20. (broad dashed line) 

Fig. 2. : The ratio of the variances, V(§i(t))/V0i(u,t)) of the 
estimates of Q\ = (3Y L + (1 — (3)Y H . 

Fig. 3. : The ratio of the variances, V(6i(t))/V(9i(u, £)) of the 
estimates of 9 2 = T L — T H . 

Fig. 4. : The ratio of the variances, V{9i{t))/V{9i(u,t)) of the 
estimates of 9 3 = (3. 

Fig. 5. : The ratio of information content about (3 extracted through 
the angular moments method and the maximal likelihood method. The X- 
axis is the actual value of j3. 

Fig. 6. : Determination of Th and I\. The X-axis has been normalized 
to Y L (actual) = 1.0. 

Fig. 7. : Determination of |A||/A |. The solid line is for 10,000 events 
and the dashed line is for 100,000 events. 

Fig. 8. : Determination of |Aj_/Ao|.The solid line is for 10,000 events 
and the dashed line is for 100,000 events. 



Fig. 9. : The moments of the "Im" terms in eq. |26|. mom5 is the value 
of — | A 1 1 1 \A±\ cos(5i + k)T / sin k and mom6 is the value of — \A \ \A±\ cos(<52 + 
k)T / sin k. 
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Observables 


Time evolutions 


|A(*)| 2 

I^iiWI 2 

\Mt)\ 2 


l^o(0)| 2 
l^||(0)| 2 
l^(0)| 2 


e~ ri * — e _r * sin(Am£)<50 
e~ ri * — e _r * sin(Am£)<50 
e -r H t + e -rt s in(Amt)(50 




Re(A*(t)A {l (t)) 
lm(Al(t)A ± (t)) 
lm(A* (t)A ± (t)) 


|A (0)p||(0)| cos(5 2 - Si) [e- TLt - e~ Tt sm(Amt)5<j)\ 
|A|,(0)p ± (0)| [e- Tt sm(5i - Amt) + \ (V r «* - e~ v A cos(<?i)<tyl 
|Ao(0)||Al(0)| 'e~ Tt sin(<5 2 - Amt) + \ \e' v ^ - e" 1 ^) cos(<J 2 )<ty 



Table 1: Time evolution of the decay B s -> J/^(-> K + K~) of an 

initially (i.e. at £ = 0) pure _B S meson. 



24 




25 




26 




27 




28 



0.8 

0.6 
I 

0.4 
0.2 



0.2 0.4 0.6 0.8 



Figure 5: The ratio of information content about (3 extracted through the 
angular moments method and the maximal likelihood method. The X-axis 
is the actual value of (3. 
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Figure 6: Determination of IV and T^. The X-axis has been normalized to 
I" £ (actual) = 1.0. 
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Figure 7: Determination of |A||/Ao|. The solid line is for 10,000 events and 
the dashed line is for 100,000 events. 
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Figure 8: Determination of | A± /Aq | .The solid line is for 10,000 events and 
the dashed line is for 100,000 events. 
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Figure 9: The moments of the "Im" terms in eq. mom5 is the value of 
— | An | \A±\ cos(5i + k)T / shift and mom6 is the value of — \Aq\ \A±\ cos(<52 + 
k)T / sin k. 
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